home *** CD-ROM | disk | FTP | other *** search
-
-
-
-
-
-
-
- 26 November, 1984
-
- hdiff 1.10
- Preliminary Documentation
-
- Purpose
- -------
-
- hdiff is a utility which can compare two standard DOS text files
- and isolate the differences between them. It can produce two
- distinct types of reports on the differences. First, hdiff can
- prepare a simple report of lines which appear in the second file
- but not in the first (insertions), and of lines which appear in
- the first file but not in the second (deletions). Second, hdiff
- can produce a special "report" which is, in fact, an Edlin
- script. This script, when applied to the first file, will
- produce a clone of the second file. This second function of
- hdiff is quite similar to the Unix utility "diff".
-
- hdiff uses a file comparison algorithm which was developed by
- Paul Heckel and described by D.E.Cortesi in Dr. Dobb's Journal
- #94 (August, 1984). The algorithm is substantially more
- efficient than traditional file comparison methods; you will find
- that it can generate a difference report between two files in
- little more than the time it takes to read the two files.
-
- This version of hdiff was derived from Cortesi's demonstration
- program, with substantial modifications which
-
- -- accomodate differences between Edlin and CP/M's Ed (for which
- the demo was written)
- -- allow use of Edlin's block move capabilities
- -- allow for much larger files through the use of all available
- memory
- -- allow the use of command line parameters and switches
- -- allow the user to specify at run time the maximum number of
- lines which will be processed. This allows hdiff to use memory
- more efficiently.
- -- allow the user to request the simpler difference report rather
- than the Edlin script.
-
-
- System requirements
- -------------------
-
- hdiff requires:
-
- -- IBM PC, PC/XT, PC/AT, or other MSDOS machine
- -- MSDOS 2.00 or later
- -- At least 128K of RAM. The more RAM you have, the larger the
- files you can process.
-
-
- Running hdiff
- -------------
-
-
-
-
-
-
-
-
-
- The general syntax for hdiff is:
-
- hdiff [-e] [-nnnn] oldfile.ext newfile.ext
-
- The optional -e switch instructs hdiff to produce an Edlin script
- file rather than the difference report. The optional -nnnn
- switch assists in memory management; it represents the maximum
- number of lines hdiff will be required to process, i.e., the
- number of lines in the larger of the two files. The default for
- this value is 2000 lines; there is an absolute maximum of 5000
- lines. See the section on memory management for more
- information about this switch.
-
- The two switches may be combined, and they may be in any order:
- '-e -1000', '-1000 -e', '-e1000', and '-1000e' are all
- equivalent.
-
- Examples:
-
- hdiff foo.c newfoo.c
-
- compares file 'foo.c' with file 'newfoo.c' and produces a simple
- report showing insertions (lines in newfoo which do not appear in
- foo) and deletions (lines in foo which do not appear in newfoo).
- Lines which have been moved but are otherwise unchanged do not
- appear in this report.
-
- hdiff -e foo.c newfoo.c
-
- compares foo.c with newfoo.c and prepares an Edlin script. This
- script, if applied to foo, will create a copy of newfoo. The
- script file is sent to the console, so a more useful command is
-
- hdiff -e foo.c newfoo.c > foo.dat
-
- which uses standard DOS redirection to send the edlin script to
- the disk file foo.dat. Note that the program logo and error
- messages are unaffected by redirection and will always be sent to
- the console device.
-
- hdiff -e4000 foo.c newfoo.c > foo.dat
-
- is equivalent to the previous command, except that it informs
- hdiff that one of the files might contain up to 4000 lines.
-
-
- Report formats
- --------------
-
- The difference report consists of lines in the format:
-
- [nnnni] text
- or
- [nnnnd] text
-
- The 'i' format indicates that the line is new (an insertion); the
- 'd' format indicates that the line is gone (a deletion). Thus:
-
-
-
-
-
-
-
-
-
- [ 1d] This line appears in the old file only
- [ 1i] This line appears in the new file only
-
- The 'nnnn' represents the line number. For 'i' lines, it's the
- line number in the new file; for 'd' lines, it's the line number
- in the old file.
-
- The Edlin script is a series of Edlin commands. See Edlin
- documentation for their meanings; the only commands which will
- appear are I (insert), D (delete), M (move), and E (End). The
- script may look a little strange if you look at it (with an
- editor or via the TYPE command). After the completion of each
- insertion sequence, there will be a heart symbol; this is the
- screen representation of Ctrl-C, which is used to terminate an
- Edlin insertion.
-
-
- Uses
- ----
-
- The simplest use for hdiff is to compare two files to see if they
- are the same. This can be used to check for corruption during
- backups, copies, etc., or to determine which of two files is
- newer. Even this simple use of hdiff can be useful in unexpected
- ways, however. For example, look at this small batch file:
-
- dir a: > temp
- find "-" temp > dir.a
- dir b: > temp
- find "-" temp > dir.b
- hdiff dir.b dir.a > temp.bat
- erase dir.a
- erase dir.b
- erase temp
-
- This batch can be used for a simple backup system. Assume that
- the default directory in drive A contains a series of files that
- you want to backup, and that the default directory in drive B
- contains the same set of files from the last backup. The batch
- will isolate differences between the two directories and prepare
- a file called temp.bat which contains a list of those files which
- have been changed or added since the last backup. (The .bat
- extension is used because many popular text editors could very
- easily convert the temp.bat file to a series of copy commands
- which could be used, in batch mode, to perform the copying.)
-
- The "Edlin" mode has potentially much more significant use.
- Perhaps its greatest potential lies in what are known as "source
- code control systems". These systems, quite common in mainframe
- and minicomputer systems, allow programmers to maintain many
- generations of program source text quite economically; rather
- than storing each modified file in its entirety, only the
- original is saved, along with a series of difference files.
-
- Hdiff provides a first step in this direction for MSDOS machines
- (see the "Plans" section below). Typical use of the current
- hdiff would be something like this. Assume that hdbase.c
- contains an "original" version of hdiff; the current version
-
-
-
-
-
-
-
-
- (1.10) is hdiff.c. First, the command
-
- hdiff -e hdbase.c hdiff.c > hd110.dat
-
- will create an edlin script which would convert hdbase.c into
- version 1.10 of hdiff.c. Typically, the actual hdiff.c file
- would them be discarded (WARNING: see below. This program is
- experimental!) As newer versions are developed, the same
- procedure is used to create hd111.dat, hd120.dat, etc. Note that
- these difference files would, in all likelihood, be much smaller
- that the total size of all of the versions.
-
- In order to "retrieve" an eralier version, say 1.00, the command
-
- copy hdbase.c hdiff.c
- edlin hdiff.c < hd100.dat
-
- would convert hdbase.c into version 1.00 of hdiff.
-
- True source code control systems are considerably more efficient
- than this "by hand" method, are much easier to use, and provide
- significant features beyond mere storage of multiple versions.
-
- For whatever it's worth, note that
-
- hdiff -e file1 file2 | edlin file1
-
- is roughly equivalent to
-
- copy file2 file1
-
- except that the original file1 is saved in file1.bak.
-
-
- cdelta and cget
- ---------------
-
- The two demonstration batches, cdelta and cget, provide a quick
- sample of the kinds of things that can be done with hdiff and
- edlin. The two batches are designed for C programs; to revise
- them for other languages, simply replace all references to ".c"
- with the desired extension (.asm, for example).
-
- The purpose of cdelta is to generate a change script which will
- convert a "base" source file into a specified version of your
- source. Cget performs the inverse task; it applies a specified
- change file to the base and produces a file containing the
- specified version. File naming conventions are as follows:
-
- file.scc: "base" source; scc = source code control
- file.###: A change script to produce version ###
- file.c: The current version (cdelta), or the
- output file (cget)
-
- For example, suppose you are working with a C program called foo.
- A base (earliest) version of this file should be in foo.scc. You
- have just finished revision 1.10 of foo. To create the change
- file, type
-
-
-
-
-
-
-
-
-
- cdelta foo 110
-
- The batch will create a new file, foo.110; this file is an Edlin
- script which will convert foo.scc into version 1.10 of foo.c.
-
- To retrieve a specified version, say 1.05, use
-
- cget foo 105
-
- The batch will apply the script foo.105 to foo.scc and produce
- foo.c, which will contain the source for version 1.05.
-
- Note that cget always creates a file called file.c, overwriting
- any existing file by that name. This implies that you do NOT
- keep your current source in file.c; you keep the current source
- only by retaining file.scc and the delta files.
-
- The library contains a set of small files which demonstrate the
- use of cdelta and cget. The files are: demo.scc (the base file),
- demo.110, demo.120, and demo.125 (difference files for versions
- 1.10, 1.20, and 1.25), and demo125.c (the full text of version
- 1.25 of the demo program, for comparison purposes).
-
- After you have removed all of the files from the library, I
- suggest that you pass all of the demo files through your text
- editor; the librarian program pads the ends of the files to even
- multiples of bytes, and the demonstration will not work properly.
-
- Try them out; for example, to get version 1.25 of demo, type
-
- cget demo 125
-
- Compare this file (demo.c) to demo125.c; they should be
- identical. Make a few changes and save the file, then type
-
- cdelta demo 130
-
- You now have a change file for your new version. Erase demo.c
- and try
-
- cget demo 130
-
- You should have a duplicate of your version 1.30. (The demo
- "program", by the way, does nothing. I haven't even compiled it,
- so there may be errors.)
-
-
- Memory management
- -----------------
-
- Hdiff uses all available memory. The purpose of the -nnnn (max
- number of lines) switch is to allow it to use memory more
- efficiently, and to allow you to more effectively use hdiff in
- very small or very large machines. This is how it works.
-
- For each *potential* line, hdiff requires approximately 34 bytes
- of storage for various tables. The default configuration (space
-
-
-
-
-
-
-
-
- for 2000 lines) will thus require about 68K bytes of data space
- for the tables. The remainder of available memory (less the size
- of the program itself and a much smaller amount of overhead data)
- is used to store the text read from the files. Text storage
- space is required for each *unique* line in either file.
-
- If you have a small machine (i.e., less RAM), that much table
- space will leave very little room for text storage; it may even
- be more space than is available, and the program will not run at
- all. If you find this to be the case, try reducing the number of
- lines via the switch (-1000, or -500, for example.)
-
- Conversely, if you have a very large machine, you will have
- plenty of space available to process file larger than 2000 lines.
- If that is the case, increase the maxlines switch as necessary
- (but remember that in no case can maxlines exceed 5000).
-
- When hdiff is finished, it displays a message like:
-
- Storage use: 19%
-
- This message tells you approximately what percentage of the total
- available memory was actually used.
-
-
- Restrictions
- ------------
-
- The following act, in one way or another, as restrictions on
- hdiff:
-
- -- File format. Hdiff is intended as a DOS text file comparator
- only. It is not a replacement for the DOS utility 'comp'. Don't
- use it on binary (program or data) files, or on word processor
- files if they contain embedded control codes.
- -- Available memory (as discussed above)
- -- Actual size of the files. Edlin will read a file only until
- 75% of its available memory is filled. Since Edlin uses only a
- maximum of 64K, this means that it will read only 48K of text.
- Hdiff cannot account for this problem, so the absolute maximum
- file size it can handle is approximately 48K.
- -- Line size. Limited to a maximum of 255 characters/line.
- -- Use of tabs. If your text editor gives you a choice in saving
- text with or without tabs, try to be consistent. That is, try
- to avoid using hdiff on one file saved with tabs, and one file
- saved without.
-
-
- A Warning and A Plan
- --------------------
-
- Hdiff is experimental! It has been tested, but there have not
- been sufficient tests to state with any great degree of
- confidence that it will perform "as advertised" with all possible
- files. Please bear this in mind as you use it. Please report
- any problems to me.
-
- I intend, at some "unspecified future time", to incorporate hdiff
-
-
-
-
-
-
-
-
- or a version of it in a larger source code control system. This
- system would allow you to maintain multiple generations of
- program source files very efficiently (in terms of storage
- requirements). Some knotty problems relating to performance on a
- standard-issue PC remain to be solved. Comments or suggestions
- relating to this system are welcome. Tell me what you would like
- to see.
-
-
- ---------------------
-
- hdiff and this document are
- Copywrite (c) 1984 by:
-
- Christopher J. Dunford
- 10057-2 Windstream Drive
- Columbia, Maryland 21044
- CompuServe 71076,1115
- Source STR211
-
- You may copy and use hdiff for your personal use only. You may
- copy hdiff for others, but you may not charge them for it. You
- may not use hdiff for any commercial purpose whatsoever. Address
- comments to the author at the above address, at CompuServe
- (preferably) or at the Source (occasionally).
-
- Hdiff is written in C and compiled using the Computer Innovations
- C86 compiler (Version 2.13), big model.
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-